Co-tuning of Software Specializers and Hardware Accelerators within a CNN Application

نویسنده

Sean Roberts

چکیده

Software specializers and hardware accelerators share the common goal of decreasing the runtime of an operation while being parameterizable and abstracting away underlying optimizations from users. The competition for reconfigurable hardware resources among candidate hardware accelerators means that tuning must take place at an application level and not at an operation level as is the case for software specializers. This paper presents a methodology for the co-tuning of software specializers and hardware accelerators so that both may be simultaneously used in applications. To explore the validity of this approach, experiments were carried with software specialized and hardware accelerated 2D stencils performing convolutions for trial convolutional neural networks. The results demonstrate that an application level co-tuner can discover which operations are best suited for software specializers and which merit the limited reconfigurable hardware resources required for hardware acceleration.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC

Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for datacenter-scale environments. The recent proliferation of mobile and IoT devices have necessitated real-time, energy-efficient deep neural network inference on...

متن کامل

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Convolutional neural networks (CNNs) are one of the most successful machine learning techniques for image, voice and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for CNNs which typically contain large numbers of multiplyaccumulate (MAC) units, the multipliers of which are large in integrated circuit (IC) gate ...

متن کامل

Compilation and Parallelization Techniques with Tool Support to Realize Sequence Alignment Algorithm on FPGA and Multicore

Reconfigurable computing (RC), such as computing using field programmable gate array (FPGA) technology has been shown as the field to accelerate a large variety of applications. RC fills the gap between hardware and software, achieving high performance on the hardware than the software and at the same time maintaining a remarkable amount of flexibility. Though there are bottlenecks associated w...

متن کامل

Towards Computational Efficiency of Next Generation Multimedia Systems

High throughput demands under complexityand power-efficiency has imposed numerous design challenges for the next generation multimedia systems. Multimedia (especially video) applications impose tight throughput constraints (e.g., frame resolutions beyond 1920×1080, at more than 30 FPS), which must be met by possibly resourceand battery-constrained underlying hardware. However, technology scalin...

متن کامل

Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce

Selective, embedded, just-in-time specialization (SEJITS) is a technique for optimizing embedded domain-specific languages through the use of specializers, or code modules developed by expert programmers that target particular accelerators such as multicore processors and GPUs via justin-time compilation. We extend SEJITS to exploit intermachine parallelism by targeting clusters of machines via...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Co-tuning of Software Specializers and Hardware Accelerators within a CNN Application

نویسنده

چکیده

منابع مشابه

Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC

Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing

Compilation and Parallelization Techniques with Tool Support to Realize Sequence Alignment Algorithm on FPGA and Multicore

Towards Computational Efficiency of Next Generation Multimedia Systems

Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce

عنوان ژورنال:

اشتراک گذاری